library(ggplot2)
library(dplyr)
library(rmarkdown)
library(knitr)
library(GEOquery)
library(SummarizedExperiment)
library(DESeq2)
library(here)
library(limma)
library(nlme)
library(MASS)
library(multcomp)
library(pheatmap)
library(variancePartition)
library(lme4)
library("r2glmm")
library(mclust)
library(gprofiler2)
library(enrichplot)
The development of effective vaccines to prevent the spread of infectious disease is one of the greatest public health achievements of the 20th century. While effective vaccines are available for most widespread infectious diseases, the exact efficacy of the vaccines in terms of individual protection against acquiring disease are variable based on the disease itself and the vaccine platform. A strong area of interest in vaccine development is identifying technologies and strategies for increasing the strength and duration of immune response to vaccination. Influenza vaccines in particular are known to have limited efficacy and duration, and developing a long-lasting universal influenza vaccine is a top target of the NIH’s National Institute for Allergy and Infectious Diseases (NIAID).
One way to measure a vaccine’s impact in an individual is to measure specific immune products, called correlates of protection (CoP), that are known indicators of a vaccine’s efficacy. In influenza vaccines, common CoPs are Hemagglutination Inhibition (HAI) and Neutralizing (Neut) antibodies. These antibodies are the end products of the complex immune response inititated at the vaccination event. One goal among vaccine experts currently is to better understand the underlying differences between individuals who have a robust immune response to a particular vaccine (high antibody titers, indicating low likelihood of infection) versus those who have little to no response to the same stimulus. One way to examine for the underlying differences in biological processes is to take a global look at gene expression in response to vaccination, and to identify genes or pathways that are differentially expressed between high and low responders. This is the approach that Bucasas et. al. took. In our project, we will replicate the analysis approach using data accessed from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO).
Data for the study on trivalent influenza vaccine are deposited in National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) with accession GSE48024. The manuscript describing results found by the study group in the male cohort was published in 2011. Data was published to the NCBI GEO in 2013.
A total of 119 males and 128 females between 19 to 41 years of age were recruited in the study and vaccinated with trivalent influenza vaccine. Whole blood samples were drawn before vaccination (day 0) and at three time points after vaccination (day 1, 3 and 14). All male samples are sequenced using Illumina HumanHT-12 V3.0 expression beadchip and all female samples are sequence using Illumina HumanHT-12 V4.0 expression beadchip.
Two separate datasets are loaded, one for female cohort and another one for male cohort.
gset <- getGEO("GSE48024", GSEMatrix =TRUE, AnnotGPL=TRUE)
## Found 2 file(s)
## GSE48024-GPL10558_series_matrix.txt.gz
## GSE48024-GPL6947_series_matrix.txt.gz
gset_female <- gset[[1]]
gset_male <- gset[[2]]
# Check gender
unique(gset_female$`gender:ch1`)
## [1] "female"
# Check gender
unique(gset_male$`gender:ch1`)
## [1] "male"
# Check sequencing platform
unique(gset_female$platform_id) #Illumina HumanHT-12 V4.0 expression beadchip
## [1] "GPL10558"
# Check sequencing platform
unique(gset_male$platform_id) #Illumina HumanHT-12 V3.0 expression beadchip
## [1] "GPL6947"
Based on the data loaded directly from GEO, 116 out of 119 males and 110 out of 128 females are available. A total of 417 samples are from female cohort and 431 samples are from male cohort.
# Check number of subjects available in data
length(unique(gset_female$`subject:ch1`))
## [1] 110
# Check number of subjects available in data
length(unique(gset_male$`subject:ch1`))
## [1] 116
# Check number of samples available in data
ncol(gset_female)
## Samples
## 417
# Check number of samples available in data
ncol(gset_male)
## Samples
## 431
There are 47276 probes used for female cohort, and 48742 probes used for male cohort.
# Check number of probes
nrow(gset_female)
## Features
## 47276
# Check number of probes
nrow(gset_male)
## Features
## 48742
Data has been processed and normalized according to description given in GEO. We randomly select 10 samples from each cohort to check the distribution of normalized data.
# Data processing + normalization
unique(gset_male$data_processing)
## [1] "Raw data underwent background adjustment, variance stabilization transformation, and robust spline normalization using R package lumi (available from Bioconductor) and function lumiExpresso()"
# Data processing + normalization
unique(gset_female$data_processing)
## [1] "Raw data underwent background adjustment, variance stabilization transformation, and robust spline normalization using R package lumi (available from Bioconductor) and function lumiExpresso()"
boxplot(exprs(gset_male)[,sample(1:431, 10, replace=F)],
main = "Ten random samples from male cohort", ylab = "Expression")
boxplot(exprs(gset_female)[,sample(1:417, 10, replace=F)],
main = "Ten random samples from female cohort", ylab = "Expression")
Last, we check how many samples at each sampling time for each cohort.
# For female
table(gset_female$`time:ch1`)
##
## Day0 Day1 Day14 Day3
## 107 107 98 105
# For male
table(gset_male$`time:ch1`)
##
## Day0 Day1 Day14 Day3
## 111 110 109 101
Because female and male cohorts are sequenced using different platforms, all further data processing and analysis will be done separately.
First to setup SummarizedExperiment object:
SE_male <- SummarizedExperiment(assay = list("exprs" = exprs(gset_male)),
colData = as(gset_male@phenoData, "data.frame"),
rowData = as(gset_male@featureData, "data.frame"))
SE_female <- SummarizedExperiment(assay = list("exprs" = exprs(gset_female)),
colData = as(gset_female@phenoData, "data.frame"),
rowData = as(gset_female@featureData, "data.frame"))
# Create few variables with better names
SE_male$time <- factor(SE_male$`time:ch1`, levels = c("Day0", "Day1", "Day3", "Day14"))
SE_male$ID <- SE_male$`subject:ch1`
SE_female$time <- factor(SE_female$`time:ch1`, levels = c("Day0", "Day1", "Day3", "Day14"))
SE_female$ID <- SE_female$`subject:ch1`
As shown in Data Collection section, not all participants are sampled at all four time points. Thus, we keep those who sampled at all time.
male.all <- as.data.frame(table(SE_male$ID)) %>% filter(Freq == 4) %>% rename( "Var1" = "ID")
female.all <- as.data.frame(table(SE_female$ID)) %>% filter(Freq == 4) %>% rename( "Var1" = "ID")
SE_male <- SE_male[, which(SE_male$ID %in% male.all$ID)]
SE_female <- SE_female[, which(SE_female$ID %in% female.all$ID)]
There are 92 males and 87 females sampled at all four time points. These will be used for downstream analysis.
table(SE_male$time)
##
## Day0 Day1 Day3 Day14
## 92 92 92 92
table(SE_female$time)
##
## Day0 Day1 Day3 Day14
## 87 87 87 87
Probes for male cohort reduces from 48742 to 29228.
Probes for female cohort reduces from 47276 to 31243.
gset_male <- gset_male[which(gset_male@featureData@data[["Gene symbol"]] != ""), ]
nrow(gset_male)
## Features
## 29228
gset_female <- gset_female[which(gset_female@featureData@data[["Gene symbol"]] != ""), ]
nrow(gset_female)
## Features
## 31243
Using limma() package.
Probes for male cohort reduces from 29228 to 19590.
Probes for female cohort reduces from 31243 to 20751.
dupRM_male <- avereps(assays(SE_male)$exprs, ID=rowData(SE_male)$"Gene symbol")
nrow(dupRM_male)
## [1] 19590
dupRM_female <- avereps(assays(SE_female)$exprs, ID=rowData(SE_female)$"Gene symbol")
nrow(dupRM_female)
## [1] 20751
Update SummarizeExperiment object with expression matrix where duplicated genes’ expression are averaged.
# Make sure samples are ordered the same
all.equal(colnames(dupRM_male) , rownames(SE_male@colData))
## [1] TRUE
SE_male_dupRM <- SummarizedExperiment(assay = list("exprs" = dupRM_male),
colData = SE_male@colData)
all.equal(colnames(dupRM_female) , rownames(SE_female@colData))
## [1] TRUE
SE_female_dupRM <- SummarizedExperiment(assay = list("exprs" = dupRM_female),
colData = SE_female@colData)
Probes for male cohort reduces from 19590 to 14692.
Probes for female cohort from 20751 to 15563.
rowData(SE_male_dupRM)$gene_variance <- rowVars(assays(SE_male_dupRM)$exprs)
quantile(rowData(SE_male_dupRM)$gene_variance)
## 0% 25% 50% 75% 100%
## 5.479204e-05 2.238396e-03 3.938472e-03 1.344815e-02 2.601662e+00
SE_male_dupRM_25up <- SE_male_dupRM[which(rowData(SE_male_dupRM)$gene_variance > 0.0022386139), ]
rowData(SE_female_dupRM)$gene_variance <- rowVars(assays(SE_female_dupRM)$exprs)
quantile(rowData(SE_female_dupRM)$gene_variance)
## 0% 25% 50% 75% 100%
## 0.0002699308 0.0070110061 0.0134392322 0.0334073424 4.5899802389
SE_female_dupRM_25up <- SE_female_dupRM[which(rowData(SE_female_dupRM)$gene_variance > 0.0070110061), ]
For downstream analysis, we use 14692 unique genes and 368 samples from the male cohort, and 15563 genes with 348 samples from the female cohort.
SE_male_dupRM_25up
## class: SummarizedExperiment
## dim: 14692 368
## metadata(0):
## assays(1): exprs
## rownames(14692): EEF1A1 GAPDH ... MCM10 ZNF703
## rowData names(1): gene_variance
## colnames(368): GSM1165057 GSM1165058 ... GSM1165486 GSM1165487
## colData names(43): title geo_accession ... time ID
SE_female_dupRM_25up
## class: SummarizedExperiment
## dim: 15563 348
## metadata(0):
## assays(1): exprs
## rownames(15563): EEF1A1 GAPDH ... MIR320C1 LINC00173
## rowData names(1): gene_variance
## colnames(348): GSM1165512 GSM1165513 ... GSM1165927 GSM1165928
## colData names(43): title geo_accession ... time ID
Data regarding informaiton on antibody titer are deposited in dbGaP with restricted access. Thus, we use 11 differentially expressed genes with respect to response given by authors to differentiate high response and low response after vaccination among participants. Please not that these 11 genes are found using male cohort.
Gene PRDX2 are not present in both female and male cohort after data pre-processing.
DE11_response <- c("STAT1", "IRF9", "SPI1", "CD74", "HLA-E", "TNFSF13B", "PRDX2", "PRDX3", "E2F2", "PTEN", "ITGB1")
DE11_response[!DE11_response %in% rownames(SE_male_dupRM_25up)]
## [1] "PRDX2"
DE11_response[!DE11_response %in% rownames(SE_female_dupRM_25up)]
## [1] "PRDX2"
We will cluster participants based on the difference between day 1 and day 0 expression of the available 10 genes.
male_exprs_day1_0 <- list( day1 = SE_male_dupRM_25up[which(rownames(SE_male_dupRM_25up) %in% DE11_response),which(SE_male_dupRM_25up$time == "Day1")],
day0 = SE_male_dupRM_25up[which(rownames(SE_male_dupRM_25up) %in% DE11_response),which(SE_male_dupRM_25up$time == "Day0")])
colnames(male_exprs_day1_0[["day1"]]) <- male_exprs_day1_0[["day1"]]$ID
colnames(male_exprs_day1_0[["day0"]]) <- male_exprs_day1_0[["day0"]]$ID
male_exprs_day1_0[["day1"]] <- as.matrix(assay(male_exprs_day1_0[["day1"]]))
male_exprs_day1_0[["day0"]] <- as.matrix(assay(male_exprs_day1_0[["day0"]]))
dim(male_exprs_day1_0[[1]])
## [1] 10 92
dim(male_exprs_day1_0[[2]])
## [1] 10 92
# Match the order of gene and order of participants
male_exprs_day1_0[[2]] <- male_exprs_day1_0[[2]][match(rownames(male_exprs_day1_0[[1]]), rownames(male_exprs_day1_0[[2]])), match(colnames(male_exprs_day1_0[[1]]), colnames(male_exprs_day1_0[[2]]))]
# Subtract day 0 from day 1 for each participants
male_diff_day1_0 <- (male_exprs_day1_0[["day1"]] - male_exprs_day1_0[["day0"]])
male_diff_day1_0[,1:5]
## IN0123 IN0126 IN0127 IN0128 IN0129
## STAT1 1.31173507 1.52068176 1.02966207 1.22048660 1.33060734
## SPI1 0.12843323 -0.08736426 -0.49973666 0.07800277 0.56261461
## PTEN -0.26807272 0.31520774 -0.45989548 0.23892477 0.35092521
## ITGB1 -0.03593713 0.03315031 -0.28064441 -0.18637363 -0.09974113
## CD74 0.53791679 -0.11917102 0.30187478 0.32976888 0.45044398
## IRF9 0.05764101 0.91409226 0.59053525 0.74820827 0.52649910
## TNFSF13B -0.18962517 0.53251828 0.94899337 0.36595157 0.41775762
## HLA-E -0.07087181 0.35358275 0.50044553 1.16363470 0.05931064
## E2F2 0.34867212 0.58911630 -0.04331496 -0.01942406 0.29831446
## PRDX3 -0.03653738 -0.03568729 -0.42529887 -0.09805881 -0.14152018
# GMM cluster
GMM_male <- Mclust(t(male_diff_day1_0), G = 2)
table(GMM_male$classification)
##
## 1 2
## 27 65
According to authors, genes (“STAT1”, “IRF9”, “SPI1”, “CD74”, “HLA-E”, “TNFSF13B”) are up-regulated in high responders, and remaining genes in the list are up-regulated in low responders.
We assign group 2 as high responders based on boxplots on the difference of expression between day 1 and day 0 by clusters from GMM.
# Gene names in Table 1
names_table1 <- c("STAT1", "IRF9", "SPI1", "CD74", "HLA-E", "TNFSF13B", "PRDX3", "E2F2", "PTEN", "ITGB1")
# Create Table1
table1 <- data.frame(Gene=names_table1, "Upregulated in"=c("High responders", "High responders","High responders","High responders","High responders","High responders","Low responders","Low responders","Low responders","Low responders"))
## Subset of Transcripts Used for prediction of the TRI
table1
## Gene Upregulated.in
## 1 STAT1 High responders
## 2 IRF9 High responders
## 3 SPI1 High responders
## 4 CD74 High responders
## 5 HLA-E High responders
## 6 TNFSF13B High responders
## 7 PRDX3 Low responders
## 8 E2F2 Low responders
## 9 PTEN Low responders
## 10 ITGB1 Low responders
# Boxplot of diff(day1-day0) by clustered group across the genes in Table 1
par(mfrow=c(2,5))
for (i in 1:length(names_table1)){
dff <- data.frame(expr = male_diff_day1_0[names_table1[i],], group=GMM_male$classification)
boxplot(expr ~group,dff, main=names_table1[i], ylab="diff(day1- day0)")
}
par(mfrow=c(1,1))
SE_male_dupRM_25up$response <- factor(GMM_male$classification[match(SE_male_dupRM_25up$ID, names(GMM_male$classification))], levels = c(1,2))
Code are not shown in html to reduce redundancy, please refer to Rmd file for code.
GMM result.
##
## 1 2
## 48 39
Overall, we assign group 2 as high responders and group 1 as low responders for female cohort based on boxplots.
We first define few functions used for analysis.
mixed_test <- function(gene,gene_name, covariates, fml, var_rowname, j){
df_test <- cbind(gene , covariates)
colnames(df_test)[1] <- "gene"
mix_model <- aov(fml, data=df_test)
df_ret <- data.frame(unlist(summary(mix_model))[paste0("Error: Within.Pr(>F)",j)])
rownames(df_ret) <- var_rowname
colnames(df_ret) <- gene_name
return(tibble(df_ret))
}
GO_analysis <- function(genes){
DE <- gost(query = genes, organism = "hsapiens", multi_query = FALSE,
correction_method = "fdr", user_threshold = 0.01,
significant = TRUE,
source = "GO:BP")
print(table(DE[["result"]][["source"]]))
DE_mod = DE$result[,c("query", "term_id",
"term_name", "p_value", "query_size",
"intersection_size", "term_size",
"effective_domain_size", "intersection_size", "source")]
DE_mod$"GeneRatio" = paste0(DE_mod$intersection_size, "/", DE_mod$query_size)
DE_mod$BgRatio = paste0(DE_mod$term_size, "/", DE_mod$effective_domain_size)
DE_mod$"pvalue" <- DE_mod$p_value
DE_mod$Count <- DE_mod$intersection_size
DE_mod$Description<- DE_mod$term_name
DE_mod_enrich <- new("enrichResult", result = DE_mod)
dotplot(DE_mod_enrich , x = 'GeneRatio', color = "pvalue") + ggtitle("GO: Biological process")
}
We first fit mixed effect model, then extract p-value from testing the fix effect on time which is treated as categorical variable.
cov_male <- data.frame(ID = SE_male_dupRM_25up$ID,
time = SE_male_dupRM_25up$time,
response = SE_male_dupRM_25up$response)
genes_male <- rownames(SE_male_dupRM_25up)
mix_test_time_male <- sapply( 1: nrow(SE_male_dupRM_25up),
function(i) mixed_test( gene = as.vector(SE_male_dupRM_25up@assays@data@listData[["exprs"]][i,]),
gene_name = genes_male[i],
covariates = cov_male,
fml = as.formula(gene ~ time + Error(ID)),
var_rowname = c("time"),
j = 1))
mix_test_time_male <- as.data.frame(unlist(mix_test_time_male))
colnames(mix_test_time_male) <- c("pvalue")
hist(mix_test_time_male$pvalue, main ="P-value from fix effect of time", xlab="P-value")
P-values are adjusted using Benjamini-Hotchberg (BH) method.
# BH adjusted p-value
mix_test_time_male$BH_pvalue <- p.adjust(mix_test_time_male$pvalue, method = "BH", n = length(mix_test_time_male$pvalue))
hist(mix_test_time_male$BH_pvalue, main ="BH adjusted p-value from fix effect of time", xlab="BH adjusted p-value")
Controlling for 1% of false discovery rate, there are 4540 differentially expressed (DE) genes across time in the male cohort.
DEtime_male <- mix_test_time_male %>% as.data.frame() %>% filter(BH_pvalue < 0.01)
nrow(DEtime_male)
## [1] 4540
Next, we plot heatmap on top 50 DE genes by BH adjusted p-value. For male cohort, the top 50 DE genes are predominantly up-regulated at day 1 after vaccination. This is consistent with what authors reported in their paper.
DE50_male <- SE_male_dupRM_25up[which(rownames(SE_male_dupRM_25up) %in% rownames(DEtime_male %>% arrange(BH_pvalue))[1:50]), ]
colData(DE50_male)$time <- factor(colData(DE50_male)$time, levels = c("Day0", "Day1", "Day3", "Day14"))
DE50_male_mean <- sapply(c("Day0", "Day1", "Day3", "Day14"), function(day) rowMeans(assay(DE50_male)[, which(colData(DE50_male)$time == day)]))
pheatmap(DE50_male_mean, cluster_rows = TRUE, cluster_cols = FALSE, scale = "row",
main = "Mean expression of top 50 DE genes wrt time by BH adjusted p-values")
Next, we perform Gene Ontology enrichment analysis on the 4540 DE genes. Specifically, we are interested in biological process.
There are 1550 GO terms showed significant association with 4540 DE genes found with respect to time. Some of the top GO term related to immune response and stress are what we expected after vaccination.
GO_analysis(rownames(DEtime_male))
##
## GO:BP
## 1550
Variable response is inferred using GMM in previous section.
mix_test_response_time_male <- sapply( 1: nrow(SE_male_dupRM_25up),
function(i) mixed_test( gene = as.vector(SE_male_dupRM_25up@assays@data@listData[["exprs"]][i,]),
gene_name = genes_male[i],
covariates = cov_male,
fml = as.formula(gene ~ time*response + Error(ID)),
var_rowname = c("time"),
j = 2))
mix_test_response_time_male <- as.data.frame(unlist(mix_test_response_time_male))
colnames(mix_test_response_time_male) <- c("pvalue")
hist(mix_test_response_time_male$pvalue, main ="P-value from fix effect of time*response", xlab="P-value")
# BH adjusted p-value
mix_test_response_time_male$BH_pvalue <- p.adjust(mix_test_response_time_male$pvalue, method = "BH", n = length(mix_test_response_time_male$pvalue))
hist(mix_test_response_time_male$BH_pvalue, main ="BH adjusted p-value from fix effect of time*response", xlab="BH adjusted p-value")
Controlling for 5% of false discovery rate, there are 573 differentially expressed (DE) genes across time in the male cohort.
DEresponse_male <- mix_test_response_time_male %>% as.data.frame() %>% filter(BH_pvalue < 0.05)
nrow(DEresponse_male)
## [1] 573
We plot heatmap on top 50 DE genes by BH adjusted p-value. For male cohort, the top 50 DE genes are predominantly up-regulated at day 1 among high responders after vaccination. There are few genes that are down-regulated at day 1among high responders after vaccination.
DE50response_male <- SE_male_dupRM_25up[which(rownames(SE_male_dupRM_25up) %in% rownames(DEresponse_male %>% arrange(BH_pvalue))[1:50]), ]
colData(DE50response_male)$time <- factor(colData(DE50response_male)$time, levels = c("Day0", "Day1", "Day3", "Day14"))
DE_response1_mean_male <- sapply(c( "Day0", "Day1", "Day3", "Day14"), function(day) rowMeans(assay(
DE50response_male)[, which(colData(
DE50response_male)$time == day & DE50response_male$response == 1)]))
colnames(DE_response1_mean_male) <- paste0(colnames(DE_response1_mean_male), "_low")
DE_response2_mean_male <- sapply(c( "Day0", "Day1", "Day3", "Day14"), function(day) rowMeans(assay(
DE50response_male)[, which(colData(
DE50response_male)$time == day & DE50response_male$response == 2)]))
colnames(DE_response2_mean_male) <- paste0(colnames(DE_response2_mean_male), "_high")
Group_male <- data.frame( Group = c(rep("Low responder",4), rep("High responder", 4)))
rownames(Group_male) <- c(colnames(DE_response1_mean_male), colnames(DE_response2_mean_male))
pheatmap(cbind(DE_response1_mean_male,DE_response2_mean_male),
cluster_rows = TRUE, cluster_cols = FALSE, scale = "row",
main = "Mean expression on top 50 DE genes wrt response over time",
annotation_col = Group_male)
From GO enrichment analysis, there are 483 GO terms showed significant association with 573 DE genes found with respect to response over time. The top GO terms are mostly related to immune response.
GO_analysis(rownames(DEresponse_male))
##
## GO:BP
## 483
Code for analysis on female cohort are not shown in html, please refer to Rmd file for code.
We fit mixed effect model, and extract p-value on testing the fixed effect of interaction between response and time. P-values are further adjusted using Benjamini-Hotchberg method.
Controlling for 1% of false discovery rate, there are 6118 differentially expressed (DE) genes across time in the male cohort.
## [1] 6118
We plot heatmap on top 50 DE genes by BH adjusted p-value. For female cohort, the top 50 DE genes are all up-regulated at day 1 after vaccination in female cohort.
There are 1682 GO terms showed significant association with 6118 DE genes found with respect to time. Some of the top GO term related to immune response and stress are what we expected after vaccination.
##
## GO:BP
## 1682
Response to vaccination is inferred using GMM in previous section. We fit a mixed effect model including interaction between response and time as fixed effect. Then p-values on testing the effect of the interaction are extracted, then adjusted using Benjamini-Hotchberg method.
Controlling for 5% of false discovery rate, there are 3217 differentially expressed (DE) genes with respect to response across time in the female cohort.
## [1] 3217
From the heatmap on top 50 DE genes by BH adjusted p-value, majority of the DE genes are down-regulated at all time points among high responders after vaccination. This is different than the result shown for male analysis.
There are 730 GO terms showed significant association with 3217 DE genes found with respect to response over time. The top GO terms are different as compared to the analysis on male cohort.
##
## GO:BP
## 730
The overarching results expressed in the original paper were replicated well for the male cohort, despite having incomplete information on actual response status. Differential expression peaked on Day 1 post vaccination, and the majority of genes that were differentially expressed when accounting for response status were related to immunological biological processes. This result was not consistent in the female cohort, where subjects assigned a high response showed over-expression at Day 0 compared to post-vaccination, and none of the top Gene Ontology terms related to biological process involved the immune response. This may indicate a difference in sexes, but is more likely an artifact of misclassification errors, as the 11 genes used to predict response category were identified in the male data. The response patterns over time, without respect to response status, look similar in terms of expression changes, significance, and GO terms, between males and females.
The two main takeaways from this data are the overall timing of the expression patterns, and potential pathways to examine to see if they can be targeted in the development of novel vaccines or adjuvants (molecules co-administered with vaccines to enhance immune response). In particular, the results of this study indicate that sampling for gene expression at Day 1 post vaccination could be a useful addition in early-phase vaccine studies, as expression at this timepoint gives the most information about the pathways triggered, and could in extreme cases (such as future pandemics) provide immediate insight into vaccine efficacy, should expression correlates be well established.